External Tertiary-Care-Hospital Validation of the Epidemiological SEER-Based Nomogram Predicting Downgrading in High-Risk Prostate Cancer Patients Treated with Radical Prostatectomy

We aimed to externally validate the SEER-based nomogram used to predict downgrading in biopsied high-risk prostate cancer patients treated with radical prostatectomy (RP) in a contemporary European tertiary-care-hospital cohort. We relied on an institutional tertiary-care database to identify biopsied high-risk prostate cancer patients in the National Comprehensive Cancer Network (NCCN) who underwent RP between January 2014 and December 2022. The model’s downgrading performance was evaluated using accuracy and calibration. The net benefit of the nomogram was tested with decision-curve analyses. Overall, 241 biopsied high-risk prostate cancer patients were identified. In total, 51% were downgraded at RP. Moreover, of the 99 patients with a biopsy Gleason pattern of 5, 43% were significantly downgraded to RP Gleason pattern ≤ 4 + 4. The nomogram predicted the downgrading with 72% accuracy. A high level of agreement between the predicted and observed downgrading rates was observed. In the prediction of significant downgrading from a biopsy Gleason pattern of 5 to a RP Gleason pattern ≤ 4 + 4, the accuracy was 71%. Deviations from the ideal predictions were noted for predicted probabilities between 30% and 50%, where the nomogram overestimated the observed rate of significant downgrading. This external validation of the SEER-based nomogram confirmed its ability to predict the downgrading of biopsy high-risk prostate cancer patients and its accurate use for patient counseling in high-volume RP centers.


Introduction
In prostate cancer, the discrepancy between the biopsy Gleason score and the radical prostatectomy (RP) Gleason score is a well-known phenomenon [1][2][3][4][5][6][7][8]. In particular, upgrading has been thoroughly studied in several single-and multi-institutional, as well as Our institutional prospectively collected prostate cancer database was used to retrospectively identify biopsy-confirmed patients in NCCN with high-risk prostate cancer (clinical tumor stage (cT) 3a or Gleason score 8-10 or prostate-specific antigen (PSA) > 20 ng/mL) treated with RP between January 2014 and December 2022 [19]. Only patients with biopsy Gleason pattern ≥3 + 4 were included, since downgrading is not applied at biopsy Gleason pattern 3 + 3. Moreover, based on the previous methodology of the initially published nomogram, only prostate cancer patients with eight to twenty-four biopsy cores sampled and PSA ≤ 50 ng/mL were included [20][21][22]. Exclusion criteria consisted of unknown PSA at biopsy, unknown cT stage, unknown biopsy Gleason pattern and unknown number of positive biopsy cores. Patients with neoadjuvant androgen-deprivation therapy (ADT) and clinical suspicion of metastases were also excluded ( Figure 1). A biopsy Gleason score in each pathological stain was investigated, defined as the highest and worst Gleason pattern. Consequently, the highest Gleason score of all stains was defined as the biopsy Gleason score [23].
Ethical approval was obtained from the institutional review boards of the University Cancer Center Frankfurt (UCT) and the Ethical Committee at the University Hospital Frankfurt, and written informed consent was obtained from all patients.

Statistical Analyses
Descriptive statistics were presented using frequency for categorical variables and median with interquartile range (IQR) for continuous variables. External validation was derived from the initial odds ratios (ORs) and intercepts of the above-mentioned covariates in the study by Wenzel et al. [15]. As recommended by the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement, external validation of the predictive model was evaluated in terms of discrimination, calibration, and net benefit [24]. Discrimination was quantified using accuracy, as well as the area under the curve (AUC) and bootstrap 95% confidence interval (CI) from the receiver operating characteristic (ROC) curve. Calibration was assessed using calibration-in-the-large (CILR) and calibration slope. The extent of over-and underestimation was graphically described using calibration plots. Decision-curve -Prostate cancer patient treated with radical prostatectomy, n= 1395   include only high-risk and very high-risk prostate cancer patients, n= 426 Exclusion criteria: -biopsy Gleason pattern 3+ 3 (n= 12) -less than eight or more than 24 biopsy cores sampled (n= 43) -PSA > 50 ng/ ml (n= 46) -unknown PSA at biopsy (n= 0) -unknown cT-stage (n= 11) -uknown biopsy Gleason pattern (n=0) -unknown prostatectomy Gleason pattern (n= 20) -unknown cores sampled (n= 0) -unknown number of positive cores (n= 1) -clinical suspicion of metastases (n= 24) -neoadjuvant androgen deprivation therapy (n=28) study cohort n= 241

Statistical Analyses
Descriptive statistics were presented using frequency for categorical variables and median with interquartile range (IQR) for continuous variables. External validation was derived from the initial odds ratios (ORs) and intercepts of the above-mentioned covariates in the study by Wenzel et al. [15]. As recommended by the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement, external validation of the predictive model was evaluated in terms of discrimination, calibration, and net benefit [24]. Discrimination was quantified using accuracy, as well as the area under the curve (AUC) and bootstrap 95% confidence interval (CI) from the receiver operating characteristic (ROC) curve. Calibration was assessed using calibrationin-the-large (CILR) and calibration slope. The extent of over-and underestimation was graphically described using calibration plots. Decision-curve-analysis (DCA) was used to evaluate the net benefit of the developed models. Finally, systematic analyses of several possible model probability cut-offs were performed. The R software environment for statistical computing and graphics (version 4.1.2) was used for all analyses [25].

External Validation of the SEER-Based Nomogram Predicting Any Downgrading in NCCN Biopsied High-Risk Prostate Cancer Patients
The published and now externally validated nomogram predicted any downgrading in our external high-volume tertiary-care RP cohort with 72% accuracy (AUC = 0.72, Figure 2A). On the calibration plot, a high agreement between the predicted and observed downgrading rates was observed across all the probabilities ( Figure 3A). In the DCA, the use of the nomogram resulted in greater net benefit for the threshold probabilities between 0.50 and 0.75, relative to both competing strategies (treat no NCCN biopsied high-risk prostate cancer patients and treat all NCCN biopsied high-risk prostate cancer patients, Figure 3B).

External Validation of the SEER-Based Nomogram Predicting Any Downgrading in NCCN Biopsied High-Risk Prostate Cancer Patients
The published and now externally validated nomogram predicted any downgrading in our external high-volume tertiary-care RP cohort with 72% accuracy (AUC = 0.72, Figure 2A). On the calibration plot, a high agreement between the predicted and observed downgrading rates was observed across all the probabilities ( Figure 3A). In the DCA, the use of the nomogram resulted in greater net benefit for the threshold probabilities between 0.50 and 0.75, relative to both competing strategies (treat no NCCN biopsied highrisk prostate cancer patients and treat all NCCN biopsied high-risk prostate cancer patients, Figure 3B).

Nomogram Cutoffs for Any-Downgrading Predictions
In Table 2 (part A) various nomogram cutoffs for any downgrading are displayed, according to the numbers and percentages of correctly classified patients (true positive) vs. those that were classified incorrectly (false positive) by the nomogram in our cohort. The predicted probabilities ranged from 1% to 80%. As suggested and discussed by Wenzel et al., the probability in the initial nomogram was 60% [15]. Using this suggested cutoff in the current cohort, we identified 78 patients (32.4%) who were above this cutoff, indicating a higher risk of any downgrading. Of these 78 patients, 55 (70.5%) exhibited any downgrading (true positive), while 23 (29.5%) did not exhibit any downgrading at RP (false negative). When a higher cutoff of 70% was used, 28 patients (11.6%) above the cutoff were identified. Of these 28 patients, 21 (75.0%) exhibited any downgrading (true positive), while 7 patients (25.0%) did not exhibit any downgrading at RP (false negative). Finally, when using a lower cutoff of 10%, 219 patients (90.9%) above the cutoff were identified. Of these 219 patients, 123 (56.2%) exhibited any downgrading (true positive), while96 (43.8%) did not exhibit any downgrading at RP.

Nomogram Cutoffs for Any-Downgrading Predictions
In Table 2 (part A) various nomogram cutoffs for any downgrading are displayed, according to the numbers and percentages of correctly classified patients (true positive) vs. those that were classified incorrectly (false positive) by the nomogram in our cohort. The predicted probabilities ranged from 1% to 80%. As suggested and discussed by Wenzel et al., the probability in the initial nomogram was 60% [15]. Using this suggested cutoff in the current cohort, we identified 78 patients (32.4%) who were above this cutoff, indicating a higher risk of any downgrading. Of these 78 patients, 55 (70.5%) exhibited any downgrading (true positive), while 23 (29.5%) did not exhibit any downgrading at RP (false negative). When a higher cutoff of 70% was used, 28 patients (11.6%) above the cutoff were identified. Of these 28 patients, 21 (75.0%) exhibited any downgrading (true positive), while 7 patients (25.0%) did not exhibit any downgrading at RP (false negative). Finally, when using a lower cutoff of 10%, 219 patients (90.9%) above the cutoff were identified. Of these 219 patients, 123 (56.2%) exhibited any downgrading (true positive), while96 (43.8%) did not exhibit any downgrading at RP. Table 2. Analyses of nomogram cutoffs in (A) the cohort of 241 high-risk prostate cancer patients treated with radical prostatectomy (RP) predicting any downgrading between biopsy and RP and (B) the cohort of 99 high-risk prostate cancer patients treated with RP predicting significant downgrading from any biopsy-based primary or secondary Gleason pattern 5 to Gleason pattern ≤ 4 + 4 at RP.

External Validation of the SEER-Based Nomogram Predicting Significant Downgrading in NCCN Biopsied High-Risk Prostate Cancer Patients
The published and now externally validated nomogram predicted significant downgrading with 71% accuracy (AUC = 0.71, Figure 2B). On the calibration plot, deviations from the ideal predictions were noted for predicted probabilities between 30% and 50%; the nomogram overestimated the observed rate of significant downgrading in the probabilities outside this range ( Figure 3C). In the DCA, the use of the nomogram resulted in a greater net benefit for threshold probabilities between 0.45 and 0.75, relative to both competing strategies (treat no NCCN high-risk prostate cancer patients or treat all NCCN high-risk prostate cancer patients, Figure 3D).

Nomogram Cutoffs for Significant-Downgrading Predictions
In Table 2 (part B) various nomogram cutoffs for significant downgrading are displayed, according to the numbers and percentages of correctly classified patients (true positive) vs. those who were classified incorrectly (false positive). The predicted probabilities ranged from 1% to 80%. The probability suggested by the initial nomogram developed by Wenzel et al. was 50%. In the current externally validated RP cohort, we identified 25 patients (25.3) who were above this suggested cutoff, indicating a higher risk of any downgrading. Of these 25 patients, 17 (68.0%) exhibited significant downgrading (true positive), while 8 (32.0%) did not exhibit significant downgrading at RP (false negative).

Discussion
The data that can be used to study downgrading, especially on biopsied patients in the NCCN with high-risk prostate cancer, are limited. Therefore, the SEER-based nomogram for the prediction of downgrading in biopsied high-risk prostate cancer patients in the NCCN treated with RP was developed by Wenzel et al. In the present study, we externally validated this nomogram within a contemporary, external European high-volume tertiary RP cohort and made several important observations. First, we observed an overall downgrading rate of 51% in the current biopsied high-risk prostate cancer patients in the NCCN treated with RP. Moreover, significant downgrading was observed in 43% of the cases (from any Gleason pattern of 5 to Gleason scores ≤4 + 4). The downgrading rates in our study (51% for any downgrading and 43% for significant downgrading) were in agreement with the rates reported by Wenzel et al. (50% for any downgrading and 44% for significant downgrading) [15]. Previously reported downgrading rates in other tertiary-care-based RP cohorts of patients with Gleason grades of 4 ranged from 45% to 61.5% [26][27][28]. However, these rates cannot be directly compared to those in previous downgrading reports, since no other reports focused only on biopsied high-risk prostate cancer patients while also including Gleason grade group 5. Surprisingly, the downgrading rates reported with the initial nomogram derived from the epidemiological SEER database are not significantly different from our reported downgrading rates, even though the SEER database's pathological reports do not undergo central pathological review. It may be hypothesized that, due to the known phenomenon of discrepancies in pathological biopsy-core reviews, the SEER-based rates may differ from our rates due to differences in pathologists' experiences across United States registries.
Second, we made important observations when comparing the tumor characteristics of biopsied high-risk prostate cancer patients in the NCCn from the current European highvolume tertiary-care RP cohort with those of the SEER-based NCCN high-risk prostate cancer cohort. For example, important differences were observed in the patients with organconfined disease. Specifically, the rate of cT1c was significantly lower in the current cohort than in the SEER-based cohort (35% vs. 52%). Conversely, the rate of cT2 was higher in the current cohort than in the SEER-based cohort (56% vs. 36%). However, the rates of clinical suspicion of non-organ-confined disease (≥cT3) were comparable between the two cohorts (9% vs. 12%) and may be explained by the rarity of patients undergoing RP with ≥cT3 on digital rectal examination due to the higher risk of surgical complications [29]. Moreover, we observed comparable biopsy-Gleason-score distributions in the current cohort and the SEER-based cohort. Specifically, a biopsy Gleason score 4 + 4 was the most frequent and 5 + 3 was the least frequent score in both cohorts. Conversely, in the RP-Gleason-score distribution, important differences were observed. Specifically, in the current cohort, the rate of RP Gleason scores of 4 + 5 was higher than in the SEER-based cohort (28% vs. 20%). Conversely, the rate of RP Gleason scores of 4 + 4 in the current cohort was lower (8% vs. 14%).
Third, the external validation of the SEER-based nomogram resulted in a 72% accuracy for any downgrading at RP. The external validation by Wenzel et al. resulted in a 71% accuracy for any downgrading [15]. Consequently, the externally validated accuracy based on our European tertiary-care-hospital cohort marginally exceeded the initial internally validated accuracy obtained by Wenzel et al. The external validation of the SEER-based nomogram predicting any downgrading is therefore clinically important, since it confirms that the nomogram is robust and may be generalizable to a European RP cohort with biopsied high-risk prostate cancer patients in the NCCN. Moreover, it indicates that an easily applicable nomogram with clinical characteristics can be accurately used to predict the likelihood of any downgrading. Specifically, the threshold of 60% initially suggested by Wenzel et al. seems adequate, as it identifies 32.4% of patients at risk of downgrading at final pathology and 70.5% of those truly downgraded at RP. Using the same cutoff of 60%, Wenzel et al. reported a lower rate of actually downgraded RP patients, of 67.5% [15].
Finally, the external validation of the SEER-based nomogram resulted in a 71% accuracy for significant downgrading. The external validation by Wenzel et al. resulted in a 68% accuracy for significant downgrading [15]. Consequently, the externally validated accuracy based on our European tertiary-care-hospital cohort exceeds the initial accuracy validated by Wenzel et al. It is noteworthy that in the current biopsied high-risk prostate-cancerpatient cohort, only 8% (n = 20) had a Gleason score of 3 + 5 and 1% (n = 1) had a score of 5 + 3. Consequently, most of the significant downgrading applied to the downgrading from Gleason-grade group 5 (Gleason scores of 4 + 5, 5 + 4, and 5 + 5). The current observations regarding significant downgrading could be even more important for patient consultations, since significant downgrading may influence treatment decisions, the prediction of the need for adjuvant radiotherapy, and post-RP management to an even greater extent than any downgrading at RP.
Taken together, the current results confirmed the frequent occurrence of downgrading in biopsied high-risk prostate cancer patients in the NCCN, with an any-downgrading rate of 51% and a significant-downgrading rate of 43% in a European high-volume tertiary-care RP center. Moreover, both differences and similarities in terms of tumor characteristics (cTstage, biopsy, and RP Gleason score) of in the biopsied high-risk prostate cancer patients in the NCCN undergoing RP were observed between the European high-volume tertiary-care cohort and the epidemiological SEER-database cohort. However, the external validation of the SEER-based nomogram's prediction of any downgrading and of significant downgrading resulted in good accuracy. Consequently, the generalizability of this nomogram may be suggested. For prostate cancer patients undergoing RP, prognosis and further treatment depend mainly on the Gleason score of the RP. Therefore, the risk of downgrading may affect treatment recommendations and patient counseling. In addition, a previous report found that downgrading was associated with a lower risk of biochemical recurrence [30].
Despite its novelty, the current study has limitations. First, we relied on a singleinstitution database with the retrospective inclusion of patients. Second, the biopsy and RP specimens may have differed across pathology institutes since some of the prostate cancer patients were referred to our center from outpatient urology practices. Third, there was heterogeneity in the definition of the biopsy Gleason score, which may have biased the downgrading rates. Finally, some of our analyses and cut-off predictions may have been limited by the sample size, especially for significant downgrading.

Conclusions
The external validation of the SEER-based nomogram confirmed its ability to predict downgrading and significant downgrading in biopsied high-risk prostate cancer patients treated with RP within our European high-volume cohort. These findings support the potential role of the nomogram in treatment decisions and patient counseling.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.